Runtime Data Flow Scheduling of Matrix Computations
نویسنده
چکیده
We investigate the scheduling of matrix computations expressed as directed acyclic graphs for shared-memory parallelism. Because of the data granularity in this problem domain, even slight variations in load balance or data locality can greatly affect performance. Well-known scheduling algorithms such as work stealing have proven time and space bounds, but these bounds do not provide a discernable indicator of performance between different scheduling algorithms and heuristics. We provide a flexible framework for scheduling matrix computations, which we use to empirically quantify different scheduling algorithms. By building software solutions based on hardware techniques through leveraging a cache coherence protocol, we develop a scheduling algorithm that addresses both load balance and data locality simultaneously and show its performance benefits.
منابع مشابه
Runtime Data Flow Graph Scheduling of Matrix Computations with Multiple Hardware Accelerators
Abstract In our previous work, we have presented a systematic methodology for parallelizing dense matrix computations using a separation of concerns between the code that implements a linear algebra algorithm and a runtime system that exploits parallelism for which only relatively simple scheduling algorithms were used to parallelize a wide range of dense matrix computations. We have extended t...
متن کاملScheduling algorithms-by-blocks on small clusters
The arrival of multicore architectures has generated an interest in reformulating dense matrix computations as algorithms-by-blocks, where submatrices are units of data and computations with those blocks are units of computation. Rather than directly executing such an algorithm, a directed acyclic graph (DAG) is generated at runtime that is then scheduled by a runtime system like SuperMatrix. T...
متن کاملAn Overview of the RAPID Run-time System for Parallel Irregular Computations
RAPID is a run-time system that uses an inspector/executor approach to parallelize irregular computations by embodying graph scheduling techniques to optimize interleaved communication and computation with mixed granularities. It provides a set of library functions for specifying irregular data objects and tasks that access these objects, extracts a task dependence graph from data access patter...
متن کاملThe Foundations of Thread-level Parallelism in the SuperMatrix Runtime System∗
In this paper, we describe the interface and implementation of the SuperMatrix runtime system. SuperMatrix exploits parallelism from matrix computations by mapping a linear algebra algorithm to a directed acyclic graph (DAG). We give detailed descriptions of how to dynamically construct a DAG where tasks consisting of matrix operations represent the nodes and data dependencies between tasks rep...
متن کاملScaling Up Matrix Computations on Shared - Memory
While the growing number of cores per chip allows researchers to solve larger scientific and engineering problems, the parallel efficiency of the deployed parallel software starts to decrease. This unscalability problem happens to both vendorprovided and open-source software and wastes CPU cycles and energy. By expecting CPUs with hundreds of cores to be imminent, we have designed a new framewo...
متن کامل